ZOMATO¶

image.png

Introduction¶

Food. Everyone owns it and loves it. Even more people discuss it. We could talk for hours and hours about food. India is appropriately known as the "Land of Spices." India produces the widest variety of spices of any nation in the world. Due to the advent of numerous domestic and foreign businesses, the restaurant industry in India has seen an extraordinary transformation. A great need for qualified experts in the sector and other linked industries has resulted from this. In order to attract more clients and provide them with better service, Indian restaurants have now gone online thanks to the technological revolution.

The demand and supply graph, however, is not quite how it should be. The restaurant business offers a wide range of opportunities due to a visible shortage of competent workers. Here come the culinary arts schools. In order to meet business expectations, conventional cookery schools and hotel management colleges have now broadened the scope of their educational offerings. Universities in India are spending time and money to train students and prepare them for the workforce.

It's not unexpected that the market for the food services industry has changed as a result of the increased frequency of eating out. The Indian food service industry has advanced significantly since the early 1990s, when it was dominated by small, unorganised businesses.

The revolution started in 1996 when companies including McDonald's, Pizza Hut, Dominos Pizza, Subway, and Yo!China opened stores there. The market for food services has been expanding ever since. The good news is that the food services industry is expected to continue growing for many years to come thanks to factors like rising disposable incomes, a growing young population, an increase in consumers in smaller towns, increased exposure to different cultures and cuisines, and a rising propensity to eat out. The analysis will primarily assist new restaurants in looking at the elements affecting the location of their business.

Purpose of Study¶

The primary goal of the Zomato dataset analysis is to gain a clear understanding of the variables influencing the overall rating of each restaurant. Different types of restaurants have been established in various locations, with Bengaluru having more than 50,000 restaurants that serve food from all over the world. Since there are new restaurants launching every day, the market is still young and there is still a growing need. However, despite rising demand, it has becoming more challenging for new restaurants to compete with existing ones. The majority of them serve similar cuisine. India's IT capital is Bengaluru. Since most people don't have the time to prepare their own meals, the majority of the locals rely primarily on restaurant fare. It is now crucial to research a location's demography because of the strong demand for eateries. What kind of food is most widely consumed there. The entire community enjoys eating vegetarian food. If so, does that area mostly consist of members of one particular religious group, such as Jain, Marwari, or Gujarati vegetarians?

Problem Statement¶

We have always been intrigued by Bengaluru's culinary scene. Bengaluru is home to restaurants from all over the world. You can find all types of cuisines in this place, from the United States to Japan, Russia to Antarctica. You name it, Bengaluru has it. The best city for foodies is Bengaluru. Restaurants are becoming more numerous every day. Whose number currently stands at 12,000 eateries and has so many dining establishments. This market has not yet reached saturation. Additionally, new eateries are appearing every day. They now find it challenging to compete with restaurants that have already achieved success. The main problems that they continue to face includes high real estate prices, rising food prices, a lack of qualified workers, a disjointed supply chain, and overlicensing.

This research intends to analyse the area's demography and culinary culture. The most significant benefit is that it will assist new restaurants in selecting their theme, menus, cuisine, price, etc. for a certain area. Additionally, it looks for culinary similarities among Bengaluru neighbourhoods. People will be able to select a restaurant based on the analysis and a number of other criteria.

The project's main goal is to attempt a response to the question based on the interest of restaurants and foodies. And what considerations ought to be made if a new restaurant is to be opened.

Content¶

The dataset contains 17 variables all of which were scraped from the zomato website. The dataset contains details of more than 50,000 restaurants in Bengaluru, in each of its neighborhood. The data is correct to the best of my knowledge, to that available on the zomato website until 15 March 2019.

image.png

Importing Data Pre-processing and Data Visualizing library.¶

In [1]:
import pandas as pd
import numpy as np

import matplotlib.pyplot as plt
import seaborn as sns
import plotly
import plotly.express as px
import plotly.graph_objs as go
from geopy.geocoders import Nominatim
import folium
from folium.plugins import HeatMap
from folium.plugins import FastMarkerCluster
from plotly import tools
import re
from plotly.offline import init_notebook_mode, plot, iplot
from wordcloud import WordCloud, STOPWORDS 
from warnings import filterwarnings
filterwarnings('ignore')
In [2]:
df = pd.read_csv("zomato.csv")
In [3]:
df.head()
Out[3]:
url address name online_order book_table rate votes phone location rest_type dish_liked cuisines approx_cost(for two people) reviews_list menu_item listed_in(type) listed_in(city)
0 https://www.zomato.com/bangalore/jalsa-banasha... 942, 21st Main Road, 2nd Stage, Banashankari, ... Jalsa Yes Yes 4.1/5 775 080 42297555\r\n+91 9743772233 Banashankari Casual Dining Pasta, Lunch Buffet, Masala Papad, Paneer Laja... North Indian, Mughlai, Chinese 800 [('Rated 4.0', 'RATED\n A beautiful place to ... [] Buffet Banashankari
1 https://www.zomato.com/bangalore/spice-elephan... 2nd Floor, 80 Feet Road, Near Big Bazaar, 6th ... Spice Elephant Yes No 4.1/5 787 080 41714161 Banashankari Casual Dining Momos, Lunch Buffet, Chocolate Nirvana, Thai G... Chinese, North Indian, Thai 800 [('Rated 4.0', 'RATED\n Had been here for din... [] Buffet Banashankari
2 https://www.zomato.com/SanchurroBangalore?cont... 1112, Next to KIMS Medical College, 17th Cross... San Churro Cafe Yes No 3.8/5 918 +91 9663487993 Banashankari Cafe, Casual Dining Churros, Cannelloni, Minestrone Soup, Hot Choc... Cafe, Mexican, Italian 800 [('Rated 3.0', "RATED\n Ambience is not that ... [] Buffet Banashankari
3 https://www.zomato.com/bangalore/addhuri-udupi... 1st Floor, Annakuteera, 3rd Stage, Banashankar... Addhuri Udupi Bhojana No No 3.7/5 88 +91 9620009302 Banashankari Quick Bites Masala Dosa South Indian, North Indian 300 [('Rated 4.0', "RATED\n Great food and proper... [] Buffet Banashankari
4 https://www.zomato.com/bangalore/grand-village... 10, 3rd Floor, Lakshmi Associates, Gandhi Baza... Grand Village No No 3.8/5 166 +91 8026612447\r\n+91 9901210005 Basavanagudi Casual Dining Panipuri, Gol Gappe North Indian, Rajasthani 600 [('Rated 4.0', 'RATED\n Very good restaurant ... [] Buffet Banashankari
In [4]:
df.shape
Out[4]:
(51717, 17)
In [5]:
df.columns
Out[5]:
Index(['url', 'address', 'name', 'online_order', 'book_table', 'rate', 'votes',
       'phone', 'location', 'rest_type', 'dish_liked', 'cuisines',
       'approx_cost(for two people)', 'reviews_list', 'menu_item',
       'listed_in(type)', 'listed_in(city)'],
      dtype='object')
In [6]:
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 51717 entries, 0 to 51716
Data columns (total 17 columns):
 #   Column                       Non-Null Count  Dtype 
---  ------                       --------------  ----- 
 0   url                          51717 non-null  object
 1   address                      51717 non-null  object
 2   name                         51717 non-null  object
 3   online_order                 51717 non-null  object
 4   book_table                   51717 non-null  object
 5   rate                         43942 non-null  object
 6   votes                        51717 non-null  int64 
 7   phone                        50509 non-null  object
 8   location                     51696 non-null  object
 9   rest_type                    51490 non-null  object
 10  dish_liked                   23639 non-null  object
 11  cuisines                     51672 non-null  object
 12  approx_cost(for two people)  51371 non-null  object
 13  reviews_list                 51717 non-null  object
 14  menu_item                    51717 non-null  object
 15  listed_in(type)              51717 non-null  object
 16  listed_in(city)              51717 non-null  object
dtypes: int64(1), object(16)
memory usage: 6.7+ MB

Cleaning Data¶

In [7]:
df.isnull().sum()
Out[7]:
url                                0
address                            0
name                               0
online_order                       0
book_table                         0
rate                            7775
votes                              0
phone                           1208
location                          21
rest_type                        227
dish_liked                     28078
cuisines                          45
approx_cost(for two people)      346
reviews_list                       0
menu_item                          0
listed_in(type)                    0
listed_in(city)                    0
dtype: int64
In [8]:
df.duplicated().sum()
Out[8]:
0

Online order and Book table¶

In [9]:
df.online_order.replace(('Yes','No'),(True, False), inplace = True)
df.book_table.replace(('Yes','No'),(True, False), inplace = True)

Rate¶

In [10]:
df['rate'].unique()
Out[10]:
array(['4.1/5', '3.8/5', '3.7/5', '3.6/5', '4.6/5', '4.0/5', '4.2/5',
       '3.9/5', '3.1/5', '3.0/5', '3.2/5', '3.3/5', '2.8/5', '4.4/5',
       '4.3/5', 'NEW', '2.9/5', '3.5/5', nan, '2.6/5', '3.8 /5', '3.4/5',
       '4.5/5', '2.5/5', '2.7/5', '4.7/5', '2.4/5', '2.2/5', '2.3/5',
       '3.4 /5', '-', '3.6 /5', '4.8/5', '3.9 /5', '4.2 /5', '4.0 /5',
       '4.1 /5', '3.7 /5', '3.1 /5', '2.9 /5', '3.3 /5', '2.8 /5',
       '3.5 /5', '2.7 /5', '2.5 /5', '3.2 /5', '2.6 /5', '4.5 /5',
       '4.3 /5', '4.4 /5', '4.9/5', '2.1/5', '2.0/5', '1.8/5', '4.6 /5',
       '4.9 /5', '3.0 /5', '4.8 /5', '2.3 /5', '4.7 /5', '2.4 /5',
       '2.1 /5', '2.2 /5', '2.0 /5', '1.8 /5'], dtype=object)
In [11]:
df['rate'].replace(np.nan, '', regex=True, inplace=True)
df['rate'].replace('-', '', regex=True, inplace=True)
df['rate'].unique()
Out[11]:
array(['4.1/5', '3.8/5', '3.7/5', '3.6/5', '4.6/5', '4.0/5', '4.2/5',
       '3.9/5', '3.1/5', '3.0/5', '3.2/5', '3.3/5', '2.8/5', '4.4/5',
       '4.3/5', 'NEW', '2.9/5', '3.5/5', '', '2.6/5', '3.8 /5', '3.4/5',
       '4.5/5', '2.5/5', '2.7/5', '4.7/5', '2.4/5', '2.2/5', '2.3/5',
       '3.4 /5', '3.6 /5', '4.8/5', '3.9 /5', '4.2 /5', '4.0 /5',
       '4.1 /5', '3.7 /5', '3.1 /5', '2.9 /5', '3.3 /5', '2.8 /5',
       '3.5 /5', '2.7 /5', '2.5 /5', '3.2 /5', '2.6 /5', '4.5 /5',
       '4.3 /5', '4.4 /5', '4.9/5', '2.1/5', '2.0/5', '1.8/5', '4.6 /5',
       '4.9 /5', '3.0 /5', '4.8 /5', '2.3 /5', '4.7 /5', '2.4 /5',
       '2.1 /5', '2.2 /5', '2.0 /5', '1.8 /5'], dtype=object)
In [12]:
df = df.loc[df.rate !='NEW']
df = df.loc[df.rate !=''].reset_index(drop=True)
remove_slash = lambda x: x.replace('/5', '') if type(x) == np.str else x
df.rate = df.rate.apply(remove_slash).str.strip().astype('float')
df['rate'].head
Out[12]:
<bound method NDFrame.head of 0        4.1
1        4.1
2        3.8
3        3.7
4        3.8
        ... 
41660    3.7
41661    2.5
41662    3.6
41663    4.3
41664    3.4
Name: rate, Length: 41665, dtype: float64>

Phone¶

In [13]:
df[~df['phone'].str.contains('[0-9+]',  na = False)]['phone'].unique()
Out[13]:
array([nan], dtype=object)
In [14]:
df['phone'].replace(np.nan, '', regex = True, inplace = True)

Location¶

In [15]:
df["location"].dropna(inplace = True)

Rest_Type¶

In [16]:
df['rest_type'].replace(np.nan, '', regex=True, inplace=True)

Dish Liked¶

In [17]:
df['dish_liked'].replace(np.nan, '', regex=True, inplace=True)

Cuisines¶

In [18]:
df['cuisines'].replace(np.nan, '', regex=True, inplace=True)

Approx cost (for two people)¶

In [19]:
df = df.rename(columns = {'approx_cost(for two people)':'cost_for_2', 'listed_in(type)':'listed_type', 'listed_in(city)':'city'})
In [20]:
df['cost_for_2'] = df['cost_for_2'].astype(str)
df['cost_for_2'] = df['cost_for_2'].apply(lambda x: x.replace(',','.'))
df['cost_for_2'] = df['cost_for_2'].astype(float)
In [21]:
df["cost_for_2"].dropna()
Out[21]:
0        800.0
1        800.0
2        800.0
3        300.0
4        600.0
         ...  
41660    800.0
41661    800.0
41662      1.5
41663      2.5
41664      1.5
Name: cost_for_2, Length: 41418, dtype: float64
In [22]:
df["cost_for_2"].isnull().sum()
Out[22]:
247
In [23]:
df.dropna(subset=["cost_for_2"], inplace=True)
In [24]:
df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 41418 entries, 0 to 41664
Data columns (total 17 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   url           41418 non-null  object 
 1   address       41418 non-null  object 
 2   name          41418 non-null  object 
 3   online_order  41418 non-null  bool   
 4   book_table    41418 non-null  bool   
 5   rate          41418 non-null  float64
 6   votes         41418 non-null  int64  
 7   phone         41418 non-null  object 
 8   location      41418 non-null  object 
 9   rest_type     41418 non-null  object 
 10  dish_liked    41418 non-null  object 
 11  cuisines      41418 non-null  object 
 12  cost_for_2    41418 non-null  float64
 13  reviews_list  41418 non-null  object 
 14  menu_item     41418 non-null  object 
 15  listed_type   41418 non-null  object 
 16  city          41418 non-null  object 
dtypes: bool(2), float64(2), int64(1), object(12)
memory usage: 5.1+ MB
In [25]:
df.head()
Out[25]:
url address name online_order book_table rate votes phone location rest_type dish_liked cuisines cost_for_2 reviews_list menu_item listed_type city
0 https://www.zomato.com/bangalore/jalsa-banasha... 942, 21st Main Road, 2nd Stage, Banashankari, ... Jalsa True True 4.1 775 080 42297555\r\n+91 9743772233 Banashankari Casual Dining Pasta, Lunch Buffet, Masala Papad, Paneer Laja... North Indian, Mughlai, Chinese 800.0 [('Rated 4.0', 'RATED\n A beautiful place to ... [] Buffet Banashankari
1 https://www.zomato.com/bangalore/spice-elephan... 2nd Floor, 80 Feet Road, Near Big Bazaar, 6th ... Spice Elephant True False 4.1 787 080 41714161 Banashankari Casual Dining Momos, Lunch Buffet, Chocolate Nirvana, Thai G... Chinese, North Indian, Thai 800.0 [('Rated 4.0', 'RATED\n Had been here for din... [] Buffet Banashankari
2 https://www.zomato.com/SanchurroBangalore?cont... 1112, Next to KIMS Medical College, 17th Cross... San Churro Cafe True False 3.8 918 +91 9663487993 Banashankari Cafe, Casual Dining Churros, Cannelloni, Minestrone Soup, Hot Choc... Cafe, Mexican, Italian 800.0 [('Rated 3.0', "RATED\n Ambience is not that ... [] Buffet Banashankari
3 https://www.zomato.com/bangalore/addhuri-udupi... 1st Floor, Annakuteera, 3rd Stage, Banashankar... Addhuri Udupi Bhojana False False 3.7 88 +91 9620009302 Banashankari Quick Bites Masala Dosa South Indian, North Indian 300.0 [('Rated 4.0', "RATED\n Great food and proper... [] Buffet Banashankari
4 https://www.zomato.com/bangalore/grand-village... 10, 3rd Floor, Lakshmi Associates, Gandhi Baza... Grand Village False False 3.8 166 +91 8026612447\r\n+91 9901210005 Basavanagudi Casual Dining Panipuri, Gol Gappe North Indian, Rajasthani 600.0 [('Rated 4.0', 'RATED\n Very good restaurant ... [] Buffet Banashankari

Exploratory Data Analysis¶

Geospatial Analysis¶

In [26]:
len(df['location'].unique())
Out[26]:
92
In [27]:
locations = pd.DataFrame({"Name":df['location'].unique()})
In [28]:
geolocator = Nominatim(user_agent = "app")
In [29]:
lat = []
lon = []
for location in locations['Name']:
    location = geolocator.geocode(location)    
    if location is None:
        lat.append(np.nan)
        lon.append(np.nan)
    else:
        lat.append(location.latitude)
        lon.append(location.longitude)
In [30]:
locations['lat'] = lat
locations['lon'] = lon
In [31]:
locations.head()
Out[31]:
Name lat lon
0 Banashankari 15.887678 75.704678
1 Basavanagudi 12.941726 77.575502
2 Mysore Road 12.946662 77.530090
3 Jayanagar 27.643927 83.052805
4 Kumaraswamy Layout 12.908149 77.555318
In [32]:
R_locations = pd.DataFrame(df['location'].value_counts().reset_index())
In [33]:
R_locations.columns=['Name','count']
R_locations.head()
Out[33]:
Name count
0 BTM 3906
1 Koramangala 5th Block 2297
2 HSR 2004
3 Indiranagar 1803
4 JP Nagar 1717
In [34]:
print(locations.shape)
print(R_locations.shape)
(92, 3)
(92, 2)
In [35]:
Restaurant_locations = R_locations.merge(locations, on = 'Name', how = "left").dropna()
Restaurant_locations.head()
Out[35]:
Name count lat lon
0 BTM 3906 45.954851 -112.496595
1 Koramangala 5th Block 2297 12.934843 77.618977
2 HSR 2004 18.147500 41.538889
3 Indiranagar 1803 12.973291 77.640467
4 JP Nagar 1717 12.265594 76.646540
In [36]:
Restaurant_locations['count'].max()
Out[36]:
3906
In [37]:
def generateBaseMap(default_location = [12.97, 77.59], default_zoom_start=12):
    base_map = folium.Map(location = default_location, zoom_start = default_zoom_start)
    return base_map
In [38]:
basemap = generateBaseMap()

Base Map for Bengaluru¶

In [39]:
basemap
Out[39]:
Make this Notebook Trusted to load map: File -> Trust Notebook
In [40]:
Restaurant_locations[['lat','lon','count']]
Out[40]:
lat lon count
0 45.954851 -112.496595 3906
1 12.934843 77.618977 2297
2 18.147500 41.538889 2004
3 12.973291 77.640467 1803
4 12.265594 76.646540 1717
... ... ... ...
87 13.100698 77.596345 4
88 12.984852 77.540063 3
89 12.927441 77.515522 2
90 13.001970 77.528839 1
91 13.032942 77.527325 1

91 rows × 3 columns

Heatmap for Restaurants in Bengaluru:¶
In [41]:
HeatMap(Restaurant_locations[['lat','lon','count']],zoom = 20,radius = 15).add_to(basemap)
basemap
Out[41]:
Make this Notebook Trusted to load map: File -> Trust Notebook
Marker Cluster Map for Restaurants in Bengaluru.¶
In [42]:
FastMarkerCluster(data=Restaurant_locations[['lat','lon','count']].values.tolist()).add_to(basemap)
basemap
Out[42]:
Make this Notebook Trusted to load map: File -> Trust Notebook
In [43]:
df.rate.replace('NEW', 0, inplace = True)
df.rate.replace('', 0, inplace = True)
In [44]:
df['rate'] = pd.to_numeric(df['rate'])
df.groupby(['location'])['rate'].mean().sort_values(ascending = False)
Out[44]:
location
Lavelle Road             4.141788
Koramangala 3rd Block    4.020419
St. Marks Road           4.017201
Koramangala 5th Block    4.006661
Church Street            3.992125
                           ...   
Rammurthy Nagar          3.346154
North Bangalore          3.340000
Peenya                   3.200000
Bommanahalli             3.190972
Old Madras Road          3.181818
Name: rate, Length: 92, dtype: float64
In [45]:
df.groupby(['location'])['rate'].mean()
Out[45]:
location
BTM                  3.571659
Banashankari         3.649866
Banaswadi            3.492161
Bannerghatta Road    3.506260
Basavanagudi         3.671092
                       ...   
West Bangalore       3.366667
Whitefield           3.623209
Wilson Garden        3.536364
Yelahanka            3.700000
Yeshwantpur          3.502679
Name: rate, Length: 92, dtype: float64
In [46]:
avg_rating = df.groupby(['location'])['rate'].mean().values
In [47]:
loc = df.groupby(['location'])['rate'].mean().index
In [48]:
geolocator = Nominatim(user_agent = "app")
In [49]:
lat=[]
lon=[]
for location in loc:
    location = geolocator.geocode(location)    
    if location is None:
        lat.append(np.nan)
        lon.append(np.nan)
    else:
        lat.append(location.latitude)
        lon.append(location.longitude)
In [50]:
rating = pd.DataFrame()
rating['location'] = loc
rating['lat'] = lat
rating['lon'] = lon
rating['avg_rating'] = avg_rating
In [51]:
rating.isna().sum()
Out[51]:
location      0
lat           1
lon           1
avg_rating    0
dtype: int64
In [52]:
rating=rating.dropna()
Marker Cluster Map according to average ratings of Restaurants.¶
In [53]:
HeatMap(rating[['lat','lon','avg_rating']],zoom = 20,radius = 15).add_to(basemap)
basemap
Out[53]:
Make this Notebook Trusted to load map: File -> Trust Notebook
In [54]:
df2 = df[df['cuisines'] == 'North Indian']
df2.head()
Out[54]:
url address name online_order book_table rate votes phone location rest_type dish_liked cuisines cost_for_2 reviews_list menu_item listed_type city
5 https://www.zomato.com/bangalore/timepass-dinn... 37, 5-1, 4th Floor, Bosco Court, Gandhi Bazaar... Timepass Dinner True False 3.8 286 +91 9980040002\r\n+91 9980063005 Basavanagudi Casual Dining Onion Rings, Pasta, Kadhai Paneer, Salads, Sal... North Indian 600.0 [('Rated 3.0', 'RATED\n Food 3/5\nAmbience 3/... [] Buffet Banashankari
50 https://www.zomato.com/bangalore/petoo-banasha... 276, Ground Floor, 100 Feet Outer Ring Road, B... Petoo False False 3.7 21 +91 8026893211 Banashankari Quick Bites North Indian 450.0 [('Rated 2.0', 'RATED\n This is a neatly made... [] Delivery Banashankari
84 https://www.zomato.com/bangalore/krishna-sagar... 38, 22nd Main, 22nd Cross, Opposite BDA, 2nd S... Krishna Sagar False False 3.5 31 +91 8892752997\r\n+91 7204780429 Banashankari Quick Bites North Indian 200.0 [('Rated 1.0', 'RATED\n Worst experience with... [] Delivery Banashankari
88 https://www.zomato.com/bangalore/nandhini-delu... 304, Opposite Apollo Public School, 100 Feet R... Nandhini Deluxe False False 2.6 283 080 26890011\r\n080 26890033 Banashankari Casual Dining Biryani, Chicken Guntur, Thali, Buttermilk, Ma... North Indian 600.0 [('Rated 3.0', 'RATED\n Ididnt like much.\n\n... [] Delivery Banashankari
102 https://www.zomato.com/bangalore/katriguppe-do... 8, Katriguppe Main Road, Vivekananda Nagar, 3r... Katriguppe Donne Biryani False False 3.2 4 +91 9964847091 Banashankari Quick Bites North Indian 300.0 [] [] Delivery Banashankari
In [55]:
north_india = df2.groupby('location')['url'].count().reset_index()
north_india.columns = ['Name','count']
north_india.head()
Out[55]:
Name count
0 BTM 241
1 Banashankari 27
2 Banaswadi 9
3 Bannerghatta Road 57
4 Basavanagudi 16
In [56]:
north_india = north_india.merge(locations, on = "Name", how = 'left').dropna()
Heatmap for 'North Indian' Cuisines.¶
In [57]:
basemap=generateBaseMap()
HeatMap(north_india[['lat','lon','count']].values.tolist(),zoom=20,radius=15).add_to(basemap)
basemap
Out[57]:
Make this Notebook Trusted to load map: File -> Trust Notebook
In [58]:
def Heatmap_Zone(zone):
    df3 = df[df['cuisines'] == zone]
    df_zone = df3.groupby(['location'],as_index=False)['url'].agg('count')
    df_zone.columns = ['Name','count']
    df_zone = df_zone.merge(locations,on="Name",how='left').dropna()
    basemap = generateBaseMap()
    HeatMap(df_zone[['lat','lon','count']].values.tolist(),zoom=20,radius=15).add_to(basemap)
    return basemap
In [59]:
df['cuisines'].unique()
Out[59]:
array(['North Indian, Mughlai, Chinese', 'Chinese, North Indian, Thai',
       'Cafe, Mexican, Italian', ..., 'Tibetan, Nepalese',
       'North Indian, Street Food, Biryani',
       'North Indian, Chinese, Arabian, Momos'], dtype=object)
Heatmap for 'South Indian' Cuisines:¶
In [60]:
Heatmap_Zone('South Indian')
Out[60]:
Make this Notebook Trusted to load map: File -> Trust Notebook
Heatmap for 'Italian' Cuisines:¶
In [61]:
Heatmap_Zone('Italian')
Out[61]:
Make this Notebook Trusted to load map: File -> Trust Notebook

Regular Analysis¶

Is Online delivering available?¶

In [62]:
labels = ["Accepted",'Not Accepted']
values = df['online_order'].value_counts()
colors = ['pink', 'darkgreen']
fig = go.Figure(data=[go.Pie(labels=labels,
                             values=values,hole=.3)])
fig.update_traces(hoverinfo='label+percent', textinfo='value', textfont_size=20,
                  marker=dict(colors=colors, line=dict(color='#000000', width=3)))
fig.update_layout(title="Online delivering available? ",
                  titlefont={'size': 30},      
                  )
fig.show()

Is Table booking available?¶

In [63]:
labels = ["Accepted",'Not Accepted']
values = df['book_table'].value_counts()
colors = ['cyan', 'darkgreen']
fig = go.Figure(data=[go.Pie(labels=labels,
                             values=values,hole=.3)])
fig.update_traces(hoverinfo='label+percent', textinfo='value', textfont_size=20,
                  marker=dict(colors=colors, line=dict(color='#000000', width=3)))
fig.update_layout(title="Table booking available? ",
                  titlefont={'size': 30},
                  )
fig.show()

Most popular cuisines of Bangalore¶

In [64]:
values = df['cuisines'].value_counts()[:20]
labels=values.index
text=values.index
fig = go.Figure(data=[go.Pie(values=values,labels=labels,hole=.3)])
fig.update_traces(hoverinfo='label+percent', textinfo='value', textfont_size=20,
                  marker=dict(line=dict(color='#000000', width=3)))
fig.update_layout(title="Most popular cuisines of Bangalore ",
                  titlefont={'size': 30},
                  )
fig.show()

Cost Comparision¶

In [65]:
fig = px.box(df, x = 'online_order', y = 'cost_for_2', color = 'online_order')

fig.update_layout(title = "Cost comparison for Online order",
                  titlefont={'size': 30},template = 'simple_white'
                  )
fig.show()
In [66]:
dfupd = df.copy()
dfupd['update_dish_liked'] = dfupd['dish_liked'].apply(lambda x : x.split(',') if type(x)==str else [''])
rest = dfupd['rest_type'].value_counts()[:9].index
In [67]:
def produce_wordcloud(rest):
    
    plt.figure(figsize=(20,30))
    for i,restaurant in enumerate(rest):
        plt.subplot(3,3,i+1)
        dishes=''
        data=dfupd[dfupd['rest_type']==restaurant]
        for word in data['dish_liked']:
            words=word.split()
            # Converts each token into lowercase 
            for i in range(len(words)): 
                words[i] = words[i].lower() 
            dishes=dishes+ " ".join(words)+" "
        wordcloud = WordCloud(max_font_size=None, background_color='black', collocations=False,stopwords = stopwords,width=1200, height=1200).generate(dishes)
        plt.imshow(wordcloud)
        plt.title(restaurant)
        plt.axis("off")

WordCloud for Dishes in each Restaurant Type¶

In [68]:
stopwords = set(STOPWORDS) 
produce_wordcloud(rest)
In [69]:
def reviewwords(restaurant):
    dataset=dfupd[dfupd['rest_type']==restaurant]
    total_review=' '
    for review in dataset['reviews_list']:
        review=review.lower()
        review=re.sub('[^a-zA-Z]', ' ',review)
        review=re.sub('rated', ' ',review)
        review=re.sub('x',' ',review)
        review=re.sub(' +',' ',review)
        total_review=total_review + str(review)
    wordcloud = WordCloud(width = 800, height = 800, 
            background_color ='black', 
            stopwords = set(STOPWORDS), 
            min_font_size = 10).generate(total_review) 
    # plot the WordCloud image                        
    plt.figure(figsize = (8, 8)) 
    plt.imshow(wordcloud) 
    plt.axis("off")

WordCloud for reviews¶

In [70]:
reviewwords('Quick Bites')
In [71]:
reviewwords('Cafe')
In [72]:
reviewwords('Delivery')

Count of total Restaurants in each Locations¶

In [73]:
fig=px.bar(x = df['city'].unique(),y = df['city'].value_counts(), labels = dict(x = "City Name", y = "Total Count"),color_continuous_scale = "Cividis", color = df['city'].unique())
fig.update_layout(title="Location wise counts for Restaurants ",
                  titlefont={'size': 30},template='simple_white'     
                  )
fig.update_traces(marker_line_color='black',
                  marker_line_width=2, opacity=1)
fig.show()

Loaction wise ratings of Restaurants.¶

In [74]:
loc_plt=pd.crosstab(df2['rate'],df2['city'])
fig=px.bar(loc_plt,x=loc_plt.index,y=loc_plt.columns,barmode='stack',opacity=1)
fig.update_layout(title="Location wise Rating",
                  titlefont={'size': 30},
                  template='simple_white'       
                  )
fig.update_traces(marker_line_color='black',
                  marker_line_width=0.5, opacity=0.8)
fig.show()

Number of Restaurants based on Services¶

In [75]:
fig=px.histogram(df['listed_type'], labels = dict(value = 'listed_type'))
fig.update_layout(title="Type of Services",
                  titlefont={'size': 30},template='simple_white'     
                  )
fig.update_traces(marker_color='pink', marker_line_color='black',
                  marker_line_width=2, opacity=1)

fig.show()

Count of Restaurants based on the Cost¶

In [76]:
fig = px.histogram(df['cost_for_2'], labels = dict(value = 'Cost Range'), nbins = 10)
fig.update_layout(title = "Cost of Restaurants",
                  titlefont = {'size': 30}, template = 'simple_white'     
                  )
fig.update_traces(marker_color = 'cyan', marker_line_color = 'black',
                  marker_line_width = 2, opacity = 1)

fig.show()

Top 10 popular Restaurant Chains in Bengaluru¶

In [77]:
chains = df['name'].value_counts()[:10]
fig = px.bar(y = chains, x = chains.index, labels = dict(x = 'Name', y = 'Count'),color_continuous_scale = "Agsunset", color = chains.index)
fig.update_layout(title = "Most famous restaurant chains",
                  titlefont = {'size': 30},template='simple_white'     
                  )
fig.update_traces( marker_line_color = 'black',
                  marker_line_width = 2, opacity=1)

fig.show()

Predicting Rating using Random Forest Regressor¶

In [78]:
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score
In [79]:
df.head()
Out[79]:
url address name online_order book_table rate votes phone location rest_type dish_liked cuisines cost_for_2 reviews_list menu_item listed_type city
0 https://www.zomato.com/bangalore/jalsa-banasha... 942, 21st Main Road, 2nd Stage, Banashankari, ... Jalsa True True 4.1 775 080 42297555\r\n+91 9743772233 Banashankari Casual Dining Pasta, Lunch Buffet, Masala Papad, Paneer Laja... North Indian, Mughlai, Chinese 800.0 [('Rated 4.0', 'RATED\n A beautiful place to ... [] Buffet Banashankari
1 https://www.zomato.com/bangalore/spice-elephan... 2nd Floor, 80 Feet Road, Near Big Bazaar, 6th ... Spice Elephant True False 4.1 787 080 41714161 Banashankari Casual Dining Momos, Lunch Buffet, Chocolate Nirvana, Thai G... Chinese, North Indian, Thai 800.0 [('Rated 4.0', 'RATED\n Had been here for din... [] Buffet Banashankari
2 https://www.zomato.com/SanchurroBangalore?cont... 1112, Next to KIMS Medical College, 17th Cross... San Churro Cafe True False 3.8 918 +91 9663487993 Banashankari Cafe, Casual Dining Churros, Cannelloni, Minestrone Soup, Hot Choc... Cafe, Mexican, Italian 800.0 [('Rated 3.0', "RATED\n Ambience is not that ... [] Buffet Banashankari
3 https://www.zomato.com/bangalore/addhuri-udupi... 1st Floor, Annakuteera, 3rd Stage, Banashankar... Addhuri Udupi Bhojana False False 3.7 88 +91 9620009302 Banashankari Quick Bites Masala Dosa South Indian, North Indian 300.0 [('Rated 4.0', "RATED\n Great food and proper... [] Buffet Banashankari
4 https://www.zomato.com/bangalore/grand-village... 10, 3rd Floor, Lakshmi Associates, Gandhi Baza... Grand Village False False 3.8 166 +91 8026612447\r\n+91 9901210005 Basavanagudi Casual Dining Panipuri, Gol Gappe North Indian, Rajasthani 600.0 [('Rated 4.0', 'RATED\n Very good restaurant ... [] Buffet Banashankari
  • dish_liked = df['dish_liked']
  • dish_liked_split = dish_liked.str.split(',', expand=True)
  • dish_liked_split.columns = [f"{i}" for i in range(dish_liked_split.shape[1])]
  • dish_liked_encoded = pd.get_dummies(dish_liked_split)
  • df_encoded1 = pd.concat([df, dish_liked_encoded], axis=1)

The code is logical and perfectly runnable, but through this it is creating 7777 columns (this is because there is review sentences present in the column). If I remove the rows having the review sentences I will be losing much precious data. So for now I am not running the code and if I run the system is taking lot of time in running below cells and needs more computational power. That is why I am not including it.

In [80]:
cuisines = df['cuisines']
cuisines_split = cuisines.str.split(',', expand=True)
cuisines_split.columns = [f"{i}" for i in range(cuisines_split.shape[1])]
cuisines_encoded = pd.get_dummies(cuisines_split)
df_encoded1 = pd.concat([df, cuisines_encoded], axis=1)
In [81]:
df_encoded1['count_dish_liked'] = 0
for index, row in df_encoded1.iterrows():
    if isinstance(row['dish_liked'], float):
        # If cuisines column contains NaN values, set cuisine count to 0
        dish_liked_count = 0
    else:
        dish_liked_list = row['dish_liked'].split(',')
        dish_liked_count = len(dish_liked_list)
    df_encoded1.at[index, 'count_dish_liked'] = dish_liked_count
In [82]:
df_encoded1['count_rest_type'] = 0
for index, row in df_encoded1.iterrows():
    if isinstance(row['rest_type'], float):
        # If cuisines column contains NaN values, set cuisine count to 0
        rest_type_count = 0
    else:
        rest_type_list = row['rest_type'].split(',')
        rest_type_count = len(rest_type_list)
    df_encoded1.at[index, 'count_rest_type'] = rest_type_count
In [83]:
df_encoded1
Out[83]:
url address name online_order book_table rate votes phone location rest_type ... 7_ Kerala 7_ North Indian 7_ Pizza 7_ Rolls 7_ Salad 7_ Seafood 7_ South Indian 7_ Thai count_dish_liked count_rest_type
0 https://www.zomato.com/bangalore/jalsa-banasha... 942, 21st Main Road, 2nd Stage, Banashankari, ... Jalsa True True 4.1 775 080 42297555\r\n+91 9743772233 Banashankari Casual Dining ... 0 0 0 0 0 0 0 0 7 1
1 https://www.zomato.com/bangalore/spice-elephan... 2nd Floor, 80 Feet Road, Near Big Bazaar, 6th ... Spice Elephant True False 4.1 787 080 41714161 Banashankari Casual Dining ... 0 0 0 0 0 0 0 0 7 1
2 https://www.zomato.com/SanchurroBangalore?cont... 1112, Next to KIMS Medical College, 17th Cross... San Churro Cafe True False 3.8 918 +91 9663487993 Banashankari Cafe, Casual Dining ... 0 0 0 0 0 0 0 0 7 2
3 https://www.zomato.com/bangalore/addhuri-udupi... 1st Floor, Annakuteera, 3rd Stage, Banashankar... Addhuri Udupi Bhojana False False 3.7 88 +91 9620009302 Banashankari Quick Bites ... 0 0 0 0 0 0 0 0 1 1
4 https://www.zomato.com/bangalore/grand-village... 10, 3rd Floor, Lakshmi Associates, Gandhi Baza... Grand Village False False 3.8 166 +91 8026612447\r\n+91 9901210005 Basavanagudi Casual Dining ... 0 0 0 0 0 0 0 0 2 1
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
41660 https://www.zomato.com/bangalore/the-farm-hous... 136, SAP Labs India, KIADB Export Promotion In... The Farm House Bar n Grill False False 3.7 34 +91 9980121279\n+91 9900240646 Whitefield Casual Dining, Bar ... 0 0 0 0 0 0 0 0 1 2
41661 https://www.zomato.com/bangalore/bhagini-2-whi... 139/C1, Next To GR Tech Park, Pattandur Agraha... Bhagini False False 2.5 81 080 65951222 Whitefield Casual Dining, Bar ... 0 0 0 0 0 0 0 0 2 2
41662 https://www.zomato.com/bangalore/best-brews-fo... Four Points by Sheraton Bengaluru, 43/3, White... Best Brews - Four Points by Sheraton Bengaluru... False False 3.6 27 080 40301477 Whitefield Bar ... 0 0 0 0 0 0 0 0 1 1
41663 https://www.zomato.com/bangalore/chime-sherato... Sheraton Grand Bengaluru Whitefield Hotel & Co... Chime - Sheraton Grand Bengaluru Whitefield Ho... False True 4.3 236 080 49652769 ITPL Main Road, Whitefield Bar ... 0 0 0 0 0 0 0 0 3 1
41664 https://www.zomato.com/bangalore/the-nest-the-... ITPL Main Road, KIADB Export Promotion Industr... The Nest - The Den Bengaluru False False 3.4 13 +91 8071117272 ITPL Main Road, Whitefield Bar, Casual Dining ... 0 0 0 0 0 0 0 0 1 2

41418 rows × 496 columns

In [84]:
df_encoded1.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 41418 entries, 0 to 41664
Columns: 496 entries, url to count_rest_type
dtypes: bool(2), float64(2), int64(3), object(12), uint8(477)
memory usage: 25.6+ MB
In [85]:
df_encoded1.to_csv('Revised_Zomato.csv', index = False)
In [86]:
data = df_encoded1
In [87]:
data.drop(['url', 'address', 'name', 'phone', 'menu_item','rest_type', 'cuisines', 'dish_liked', 'reviews_list'], axis=1, inplace=True)
In [88]:
def convert_string_to_num(col):
    values = col.unique()
    key = {}
    for i, val in enumerate(values):
        key[val] = i
    col = col.map(key)
    return col

data['online_order'] = convert_string_to_num(data['online_order'])
data['book_table'] = convert_string_to_num(data['book_table'])
data['location'] = convert_string_to_num(data['location'])
data['listed_type'] = convert_string_to_num(data['listed_type'])
data['city'] = convert_string_to_num(data['city'])
In [89]:
data.head()
Out[89]:
online_order book_table rate votes location cost_for_2 listed_type city 0_ 0_African ... 7_ Kerala 7_ North Indian 7_ Pizza 7_ Rolls 7_ Salad 7_ Seafood 7_ South Indian 7_ Thai count_dish_liked count_rest_type
0 0 0 4.1 775 0 800.0 0 0 0 0 ... 0 0 0 0 0 0 0 0 7 1
1 0 1 4.1 787 0 800.0 0 0 0 0 ... 0 0 0 0 0 0 0 0 7 1
2 0 1 3.8 918 0 800.0 0 0 0 0 ... 0 0 0 0 0 0 0 0 7 2
3 1 1 3.7 88 0 300.0 0 0 0 0 ... 0 0 0 0 0 0 0 0 1 1
4 1 1 3.8 166 1 600.0 0 0 0 0 ... 0 0 0 0 0 0 0 0 2 1

5 rows × 487 columns

In [90]:
# Split data into training and testing sets
X = data.drop(['rate'], axis=1)
y = data['rate']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 42)

# Train the model
rf = RandomForestRegressor(n_estimators = 100, random_state = 42)
rf.fit(X_train, y_train)

y_pred = rf.predict(X_test)

mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print("Mean Squared Error: {:.2f}".format(mse))
print("R2 Score: {:.2f}".format(r2))
Mean Squared Error: 0.02
R2 Score: 0.92
In [91]:
plt.scatter(rf.predict(X_train), rf.predict(X_train) - y_train, 
            color = "green", s = 10, label = "Train data")

plt.scatter(rf.predict(X_test), rf.predict(X_test) - y_test, 
            color = "blue", s = 10, label = "Test data")

plt.legend(loc = "upper right")

plt.title("Residual errors")

plt.show()
In [92]:
residuals = y_test - y_pred
plt.scatter(y_pred, residuals)
plt.xlabel('Predicted Ratings')
plt.ylabel('Residuals')
plt.title('Residual Plot')
plt.show()
In [93]:
plt.scatter(y_test, y_pred)
plt.xlabel('Actual Ratings')
plt.ylabel('Predicted Ratings')
plt.title('Actual vs Predicted Ratings')
plt.show()
In [94]:
residuals = y_test - y_pred
plt.hist(residuals, bins=30)
plt.xlabel('Residuals')
plt.ylabel('Frequency')
plt.title('Histogram of Residuals')
plt.show()
In [95]:
importances = pd.Series(rf.feature_importances_, index=X.columns)
importances.nlargest(10).plot(kind='barh')
plt.title('Feature Importances')
plt.xlabel('Importance')
plt.ylabel('Features')
plt.show()
In [96]:
data.corr()
Out[96]:
online_order book_table rate votes location cost_for_2 listed_type city 0_ 0_African ... 7_ Kerala 7_ North Indian 7_ Pizza 7_ Rolls 7_ Salad 7_ Seafood 7_ South Indian 7_ Thai count_dish_liked count_rest_type
online_order 1.000000 -0.054771 -0.069354 0.013319 0.049634 -0.179486 0.239442 0.054101 -0.010046 -0.012807 ... -0.013290 0.019231 -0.004991 -0.005023 0.012428 -0.007103 -0.011232 0.009615 -0.089042 0.057937
book_table -0.054771 1.000000 -0.426095 -0.393434 -0.032901 0.266558 -0.114141 -0.029076 0.005889 -0.011464 ... 0.007791 -0.008621 -0.025116 0.002944 -0.028408 -0.023195 0.006585 -0.016401 -0.441302 -0.225198
rate -0.069354 -0.426095 1.000000 0.434764 0.032485 -0.115575 0.033588 0.024561 -0.008686 0.035868 ... -0.005678 0.003932 0.025919 0.009457 0.013649 0.011144 -0.003539 0.006303 0.600934 0.174914
votes 0.013319 -0.393434 0.434764 1.000000 0.007221 -0.116102 0.070343 0.026530 -0.005351 0.002553 ... 0.001125 -0.003938 0.003565 -0.001002 0.005231 0.018018 -0.004749 0.007788 0.438448 0.201077
location 0.049634 -0.032901 0.032485 0.007221 1.000000 -0.019359 0.040580 0.359206 0.000783 -0.011333 ... 0.010554 -0.014721 -0.004663 -0.004576 -0.015641 -0.008894 -0.000274 -0.000465 0.027271 0.021978
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
7_ Seafood -0.007103 -0.023195 0.011144 0.018018 -0.008894 -0.014954 -0.010982 -0.009201 -0.000137 -0.000174 ... -0.000181 -0.000137 -0.000181 -0.000068 -0.000118 1.000000 -0.000153 -0.000068 0.012592 -0.004200
7_ South Indian -0.011232 0.006585 -0.003539 -0.004749 -0.000274 0.001945 -0.000094 0.002178 -0.000216 -0.000275 ... -0.000286 -0.000216 -0.000286 -0.000108 -0.000187 -0.000153 1.000000 -0.000108 -0.002900 -0.006641
7_ Thai 0.009615 -0.016401 0.006303 0.007788 -0.000465 -0.010559 0.010058 -0.003027 -0.000097 -0.000123 ... -0.000128 -0.000097 -0.000128 -0.000048 -0.000084 -0.000068 -0.000108 1.000000 0.008904 -0.002969
count_dish_liked -0.089042 -0.441302 0.600934 0.438448 0.027271 0.012874 0.046316 0.013475 -0.012795 0.022703 ... 0.023560 -0.001319 0.023560 0.008904 0.015422 0.012592 -0.002900 0.008904 1.000000 0.153120
count_rest_type 0.057937 -0.225198 0.174914 0.201077 0.021978 -0.164643 0.065063 0.028248 -0.005939 -0.007572 ... -0.007858 -0.005939 0.035763 -0.002969 -0.005143 -0.004200 -0.006641 -0.002969 0.153120 1.000000

487 rows × 487 columns

Predicting Rating using Deep Learning Model - TensorFlow¶

In [97]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder, StandardScaler
from keras.models import Sequential
from keras.layers import Dense, Dropout
from keras.optimizers import Adam
from keras.callbacks import EarlyStopping
import matplotlib.pyplot as plt

# Scale numerical features
scaler = StandardScaler()
data[['cost_for_2', 'votes']] = scaler.fit_transform(data[['cost_for_2', 'votes']])
In [98]:
# Split the data into training and testing sets
X = data.drop(['rate'], axis=1)
y = data['rate']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3, random_state = 42)
In [99]:
# Define the model
model = Sequential()
model.add(Dense(128, input_dim=X_train.shape[1], activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(64, activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(32, activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(1))
In [100]:
model.summary()
Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 dense (Dense)               (None, 128)               62336     
                                                                 
 dropout (Dropout)           (None, 128)               0         
                                                                 
 dense_1 (Dense)             (None, 64)                8256      
                                                                 
 dropout_1 (Dropout)         (None, 64)                0         
                                                                 
 dense_2 (Dense)             (None, 32)                2080      
                                                                 
 dropout_2 (Dropout)         (None, 32)                0         
                                                                 
 dense_3 (Dense)             (None, 1)                 33        
                                                                 
=================================================================
Total params: 72,705
Trainable params: 72,705
Non-trainable params: 0
_________________________________________________________________
In [101]:
# Compile the model
model.compile(loss='mean_squared_error', optimizer=Adam(learning_rate=0.01))
In [102]:
# Define early stopping callback
early_stopping = EarlyStopping(monitor='val_loss', patience=5)

# Fit the model
history = model.fit(X_train, y_train, epochs=20, batch_size=64, validation_data=(X_test, y_test), callbacks=[early_stopping])
Epoch 1/20
453/453 [==============================] - 6s 10ms/step - loss: 0.6735 - val_loss: 0.1122
Epoch 2/20
453/453 [==============================] - 4s 9ms/step - loss: 0.2021 - val_loss: 0.1079
Epoch 3/20
453/453 [==============================] - 4s 9ms/step - loss: 0.1588 - val_loss: 0.1048
Epoch 4/20
453/453 [==============================] - 5s 10ms/step - loss: 0.1302 - val_loss: 0.1068
Epoch 5/20
453/453 [==============================] - 4s 9ms/step - loss: 0.1096 - val_loss: 0.0971
Epoch 6/20
453/453 [==============================] - 4s 9ms/step - loss: 0.0995 - val_loss: 0.0952
Epoch 7/20
453/453 [==============================] - 4s 10ms/step - loss: 0.0954 - val_loss: 0.0973
Epoch 8/20
453/453 [==============================] - 4s 10ms/step - loss: 0.0928 - val_loss: 0.0924
Epoch 9/20
453/453 [==============================] - 6s 14ms/step - loss: 0.0925 - val_loss: 0.0938
Epoch 10/20
453/453 [==============================] - 4s 10ms/step - loss: 0.0913 - val_loss: 0.0887
Epoch 11/20
453/453 [==============================] - 7s 15ms/step - loss: 0.0912 - val_loss: 0.0880
Epoch 12/20
453/453 [==============================] - 4s 9ms/step - loss: 0.0908 - val_loss: 0.0904
Epoch 13/20
453/453 [==============================] - 5s 11ms/step - loss: 0.0906 - val_loss: 0.0925
Epoch 14/20
453/453 [==============================] - 4s 9ms/step - loss: 0.0899 - val_loss: 0.0863
Epoch 15/20
453/453 [==============================] - 4s 9ms/step - loss: 0.0912 - val_loss: 0.0905
Epoch 16/20
453/453 [==============================] - 4s 9ms/step - loss: 0.0903 - val_loss: 0.0920
Epoch 17/20
453/453 [==============================] - 5s 10ms/step - loss: 0.0911 - val_loss: 0.0896
Epoch 18/20
453/453 [==============================] - 4s 9ms/step - loss: 0.0909 - val_loss: 0.0899
Epoch 19/20
453/453 [==============================] - 4s 9ms/step - loss: 0.0922 - val_loss: 0.0938
In [103]:
# Plot the evaluation graph
plt.plot(history.history['loss'], label='Training Loss')
plt.plot(history.history['val_loss'], label='Validation Loss')
plt.title('Model Evaluation')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend()
plt.show()

Recommendation System¶

In [104]:
import pandas as pd
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity

df = pd.read_csv('zomato.csv', encoding='ISO-8859-1')

df['approx_cost(for two people)'] = df['approx_cost(for two people)'].astype(str)
df['approx_cost(for two people)'] = df['approx_cost(for two people)'].apply(lambda x: x.replace(',','.'))
df['approx_cost(for two people)'] = df['approx_cost(for two people)'].astype(float)
In [105]:
def get_user_input():
    cuisine_type = input("What type of cuisine are you in the mood for? ")
    location = input("Where are you located? ")
    cost = float(input("What is your budget for two people? "))
    try:
        cost = float(cost)
    except ValueError:
        print("Invalid budget input. Defaulting to 500.")
        cost = 500
    return cuisine_type, location, cost

def filter_restaurants(cuisine_type, location, cost):
    # Filter the restaurants based on cuisine type, location, and cost
    filtered_data = df[(df['cuisines'].str.contains(cuisine_type, na=False)) &
                       (df['location'].str.contains(location, na=False)) &
                       (df['approx_cost(for two people)'] <= cost)]
    return filtered_data

def calculate_similarity(vec1, vec2):
    # Calculate the cosine similarity between two vectors
    return cosine_similarity(vec1.reshape(1, -1), vec2.reshape(1, -1))[0][0]

def recommend_restaurants(cuisine_type, location, cost):
    # Filter the restaurants
    filtered_data = filter_restaurants(cuisine_type, location, cost)
    
    # Calculate the user input vector
    user_input_vec = np.zeros(len(filtered_data.columns) - 1)
    user_input_vec[-1] = cost
    
    # Calculate the similarity between the user input vector and each restaurant vector
    similarities = []
    for i in range(len(filtered_data)):
        restaurant_vec = np.zeros(len(filtered_data.columns) - 1)
        restaurant_cuisine_types = filtered_data.iloc[i]['cuisines'].split(', ')
        for cuisine_type in restaurant_cuisine_types:
            if cuisine_type in filtered_data.columns:
                cuisine_type_index = np.where(filtered_data.columns==cuisine_type)[0][0]
                restaurant_vec[cuisine_type_index] = 1
        restaurant_cost = filtered_data.iloc[i]['approx_cost(for two people)']
        restaurant_vec[-1] = restaurant_cost
        similarity = calculate_similarity(user_input_vec[1:], restaurant_vec[1:])
        similarities.append((i, similarity))
    
    # Sort the restaurants based on similarity and return the top 10 recommendations
    similarities.sort(key=lambda x: x[1], reverse=True)
    recommendations = filtered_data.iloc[[x[0] for x in similarities[:10]]]
    return recommendations[['name', 'location', 'cuisines', 'rate']]

# Get user input and recommend restaurants
cuisine_type, location, cost = get_user_input()
recommendations = recommend_restaurants(cuisine_type, location, cost)
print('Recommended restaurants:')
print(recommendations)
What type of cuisine are you in the mood for? North Indian
Where are you located? BTM
What is your budget for two people? 500
Recommended restaurants:
                    name location  \
922              eat.fit      BTM   
928  Hiyar Majhe Kolkata      BTM   
932    Sri Lakshmi Dhaba      BTM   
934       Swadista Aahar      BTM   
940       Swad Punjab Da      BTM   
942            Roti Wala      BTM   
946          Apna Punjab      BTM   
947     Paratha Junction      BTM   
952          Kullad Cafe      BTM   
954          Litti Twist      BTM   

                                              cuisines   rate  
922  Healthy Food, North Indian, Biryani, Continent...  4.5/5  
928                              Bengali, North Indian  4.0/5  
932                                       North Indian  2.9/5  
934   South Indian, North Indian, Chinese, Street Food  4.1/5  
940                                       North Indian  4.0/5  
942                                       North Indian  4.0/5  
946                   North Indian, Chinese, Fast Food  3.6/5  
947                              North Indian, Chinese  2.9/5  
952           North Indian, Cafe, Fast Food, Beverages  3.9/5  
954                               North Indian, Bihari  4.1/5  
Done by: Dhyey Chauhan, Harjot Singh Bali and Nishit Rathod¶